Kenneth Arrow, Medical Uncertainty, and the False Dream of Healthcare as a Normal Market

Kenneth Arrow’s great insight was not that healthcare markets are imperfect; it was that their imperfections are not decorative scratches on an otherwise respectable machine. They are the machine. In “Uncertainty and the Welfare Economics of Medical Care,” published in 1963, Arrow explained why medical care does not behave like ordinary commerce with a waiting room attached. The patient does not arrive as a calm shopper comparing toaster warranties. The patient arrives frightened, time-poor, physiologically compromised, financially exposed, and forced to make decisions under conditions where the seller knows vastly more than the buyer, the outcome is uncertain, and the purchase itself may be inseparable from trust.

This is why Arrow still matters to Healthcare Information Technology [Healthcare IT, the digital and operational systems used to run healthcare delivery, administration, analytics, and research]. He was writing before the modern Electronic Health Record [EHR, the clinical system used to document patient care], before Health Level Seven version 2 [HL7 v2, the older but still widely used messaging standard that moves healthcare events between systems], before Fast Healthcare Interoperability Resources [FHIR, a modern web-oriented standard for representing and exchanging healthcare data], before risk adjustment, prior authorization portals, algorithmic triage, revenue cycle analytics, and cheerful dashboards that make hospital misery look like airport logistics. Yet the paper reads as though it has wandered into the present wearing a sensible suit and found all of us still arguing in the corridor.

Arrow begins from welfare economics, which asks when markets allocate resources efficiently. In the tidy competitive model, buyers know what they want, sellers compete, prices carry information, contracts are enforceable, and people can walk away. Medical care violates this arrangement with the enthusiasm of a Bengal monsoon entering through a cracked roof. Illness is unpredictable. Treatment effectiveness is uncertain. The patient cannot reliably judge the quality of diagnosis before receiving it, and often cannot judge it afterward. A physician’s advice is not merely another product; it shapes the demand for the product. The person recommending the operation may also be the person paid to perform it. That does not prove corruption. It proves structural danger.

The ordinary market assumes that the buyer’s demand comes from the buyer. In healthcare, demand is often co-authored by the clinician, the insurer, the hospital, the available technology, the coding rules, the fear of liability, the patient’s social support, and the little bureaucratic ghosts living in coverage policy. The market is not absent, but it is tangled in a larger arrangement of trust, authority, ethics, regulation, subsidy, and professional control. Arrow’s non-obvious architectural lesson is that healthcare institutions are not merely market participants; they are compensating structures built because the market’s standard assumptions fail. Licensure, medical ethics, insurance, nonprofit hospitals, accreditation, utilization review, clinical guidelines, and documentation standards are not accidental ornaments. They are load-bearing substitutes for knowledge that patients cannot possess at the moment they need it most.

That point should unsettle anyone building modern healthcare platforms. A great deal of Healthcare IT pretends, often silently, that data generated inside this distorted market can be treated as if it were a clean record of clinical reality. But EHR data is not reality. It is reality after passing through workflow, reimbursement, professional habit, interface constraints, local templates, regulatory fear, and the practical need to finish a note before lunch turns into dinner. A diagnosis code may represent a confirmed disease, a suspected condition, a billing necessity, a rule-out, a historical problem never removed, or a phrase dragged forward by copy-and-paste like a small dead animal tied to the bumper. The table says “problem list.” The world says “maybe.”

This is where Arrow’s paper becomes more than health economics. It becomes a theory of healthcare representation. Healthcare data is produced under uncertainty, and then later reused as if the uncertainty evaporated during extraction. The original physician may have known that a diagnosis was provisional. The nurse may have known that the medication list was wrong because the patient brought three plastic bags of pills and one contained tablets from 2017. The billing team may have known that the code was the nearest reimbursable object, not the finest clinical description. The interface engine may have known nothing at all, because interface engines are loyal mules: they carry what they are given, even when what they are given is semantically drunk.

This is why the distinction between data transport and semantic meaning matters so much. HL7 v2 can move an admission, discharge, transfer message from one system to another. It can move an observation result, an order, or a patient demographic update. FHIR can expose cleaner resources through application programming interfaces [APIs, defined ways for software systems to request and exchange data]. Clinical Document Architecture [CDA, a document-based standard for exchanging clinical summaries] can package narrative and structured sections into a recognizable clinical artifact. But transport is not meaning. A message can arrive perfectly and still lie by omission. A FHIR Condition resource can be syntactically valid and still fail to communicate whether the diagnosis is active, suspected, historical, patient-reported, billing-derived, clinician-confirmed, or merely inherited from a previous system like a family curse.

Healthcare organizations often call this a data quality problem. Sometimes it is. There are misspellings, duplicate patients, bad timestamps, stale medication lists, orphaned orders, mangled units of measure, and interface mappings that appear to have been done during an electrical storm. But many failures labeled “data quality” are actually representation failures. The data is not dirty because someone failed to scrub it. It is dirty because the system asked a complicated human event to fit inside a field that had no place for uncertainty, intent, provenance, workflow state, or disagreement. A missing lab value may mean the test was not ordered, not performed, performed outside the network, blocked by cost, available only as a scanned document, delayed in an interface queue, or clinically irrelevant. These are not the same fact. Treating them as the same null value is not cleaning the data; it is burying a small epistemological crime in Structured Query Language [SQL, the common language used to query relational databases].

Arrow’s argument also helps explain why insurance is not a mere payment wrapper. Insurance changes behavior because it changes the felt price of care, the administrative route into care, and the negotiating power around care. In theory, insurance protects patients against financial catastrophe. In practice, it also creates third-party control, documentation burden, medical necessity rules, coding incentives, prior authorization rituals, network effects, and the peculiar modern spectacle of clinicians writing notes for three audiences at once: the patient, the next clinician, and the payment machinery hiding behind the curtain. The EHR becomes the place where all three audiences collide. It is not a clinical diary. It is a transaction ledger, legal shield, care coordination tool, quality reporting substrate, billing instrument, research source, and increasingly a training ground for Artificial Intelligence [AI, computational systems that infer patterns or generate outputs from data].

That last role is the dangerous one. AI in healthcare inherits Arrow’s world, but at machine speed. A clinical decision support system may appear to reason over patient facts, but it is often reasoning over facts that have already been filtered through asymmetry, uncertainty, coding incentives, and workflow shortcuts. A model trained to predict readmission may learn the hospital’s discharge habits as much as the patient’s physiology. A risk score may discover that patients with poor access to care generate fewer documented diagnoses and therefore appear deceptively healthy. A documentation assistant may make a note more fluent while preserving the old ambiguity in smoother grammar, like polishing a cracked teacup and declaring victory over ceramics.

The central design mistake is believing that better computation fixes malformed representation. It does not. It scales it. If the EHR treats “absence of evidence” as “evidence of absence,” AI will do the same with better typography. If a data warehouse collapses physician uncertainty into binary flags, the model will learn false certainty. If claims data treats reimbursed care as delivered need, population health analytics will confuse market access with disease burden. If a registry accepts only final diagnoses, it may erase the diagnostic journey, which is often where the most important clinical intelligence lives. Arrow’s patient was uncertain at the bedside. Our systems are uncertain in the database. The uncertainty did not disappear. It changed costume.

The practical implication is plain but hard: healthcare architecture must make uncertainty, provenance, and workflow context first-class design objects. Provenance means knowing where a data element came from, who or what asserted it, when it was asserted, under what workflow, for what purpose, and whether later evidence contradicted it. Temporal modeling matters because a diagnosis before treatment, during treatment, and after treatment is not the same object wearing three hats. Terminology mapping matters because Systematized Nomenclature of Medicine Clinical Terms [SNOMED CT, a clinical terminology for representing medical concepts], Logical Observation Identifiers Names and Codes [LOINC, a terminology for identifying lab tests and clinical observations], and International Classification of Diseases [ICD, a diagnosis classification system used heavily in billing and reporting] do different jobs. Pretending that a billing classification and a clinical ontology are interchangeable is like using a railway timetable as a map of human longing. It has information. It is not the thing.

The same warning applies to research systems. Clinical Data Interchange Standards Consortium [CDISC, a standards organization for clinical research data] and Study Data Tabulation Model [SDTM, a CDISC model for organizing clinical trial submission datasets] impose discipline, which is useful and necessary. But research data models often require cleaner, more bounded representations than routine care naturally produces. A trial dataset wants a defined variable. A clinic produces a trail of suspicion, treatment, patient adherence, insurance interruption, symptom drift, and documentation artifacts. Mapping routine care into research form is not simply extract-transform-load. It is a negotiation between the world as lived and the world as submitted. Some information becomes structured. Some becomes narrative. Some becomes “unknown.” Some vanishes with a faint administrative cough.

CDA exposed this tension beautifully and painfully. Its narrative sections preserve clinical readability, while its structured entries support computation. That duality is not a defect; it is an honest admission that medical meaning often exceeds neat coding. FHIR improves granularity by representing smaller resources such as Patient, Observation, Condition, MedicationRequest, Encounter, and Procedure. That is useful. But resource granularity does not by itself solve semantic granularity. A Condition resource can still represent too many possible social and clinical meanings. FHIR profiles and implementation guides help by constraining use, but local workflow still leaks through. Standards can narrow ambiguity. They cannot abolish it by decree, any more than painting lane markers on a Kolkata street abolishes negotiation among buses, scooters, goats, pedestrians, and metaphysics.

Arrow also helps explain why trust remains central even in highly technical systems. Patients trusted physicians because they had little alternative. Modern systems ask clinicians to trust EHR summaries, medication reconciliation screens, outside records, AI suggestions, payer rules, and quality measure prompts. Each trust relation has its own failure mode. Outside records may be incomplete. Medication histories may be stitched together from pharmacy fills that do not prove ingestion. AI recommendations may be statistically plausible but clinically tone-deaf. Quality measures may reward documentation of an action more reliably than the action itself. The old asymmetry between doctor and patient has not vanished; it has multiplied into asymmetries among patient, clinician, institution, vendor, payer, regulator, and algorithm.

The organizational structure is encoded in the data. A hospital with strong specialty silos will produce fragmented records. A payer-driven environment will produce exquisitely coded billable events and surprisingly weak accounts of suffering that did not meet reimbursement grammar. A research hospital will preserve protocol-defined variables with priestly care while routine social context drifts in narrative fog. A fragmented Health Information Exchange [HIE, an infrastructure for sharing health information across organizations] may connect institutions technically while leaving identity matching, consent, provenance, and semantic reconciliation only partially solved. The data warehouse will then inherit not only data, but politics. Every table is a fossil bed of organizational compromise.

Why does the failure persist? Because a clean solution would require changing not only software, but incentives, liability, reimbursement, clinical time, governance, procurement, and professional power. No interface engine can repair a reimbursement system that rewards one representation over another. No terminology server can force clinicians to document uncertainty in a structured way if the user interface punishes them for doing so. No AI governance board can fully correct training data whose missingness reflects unequal access to care. No interoperability standard can make two organizations mean the same thing by “active problem” when their workflows, billing practices, and clinical cultures differ. The hard constraint is that healthcare is not one system. It is a federation of partially cooperating institutions that often exchange data because regulation, payment, or patient safety forces them to, not because their internal models of reality agree.

Still, the direction is not hopeless. It is just not clean. Architecturally, the goal should be layered honesty. Move the data, but do not confuse movement with understanding. Normalize where necessary, but preserve source context. Map terminologies, but record mapping confidence and intended use. Build canonical models, but allow local extensions where workflow reality genuinely differs. Separate clinical assertion from billing classification. Distinguish patient-reported, clinician-confirmed, algorithm-inferred, and payer-derived facts. Treat time as a first-class dimension, not a column reluctantly named “updated_date.” Design AI decision support so that it can show provenance, uncertainty, exclusion logic, and the evidence boundary around its own recommendation. A model that cannot explain what kind of data it is relying on should not be allowed to dress itself as clinical wisdom.

Governance must also move closer to production reality. Data governance that lives only in committees becomes ceremonial incense. The real governance happens in interface specifications, value-set maintenance, order catalog design, note templates, default selections, identity matching rules, extract logic, and the quiet decision to overwrite or preserve a source value. Healthcare organizations need semantic stewardship: a durable function that connects clinicians, informaticists, integration engineers, data architects, compliance teams, researchers, and revenue cycle experts around the question, “What does this data element mean here, and what must no downstream user assume?” That question sounds modest. It is dynamite with reading glasses.

Arrow’s 1963 essay remains foundational because it refuses the cheap simplification. Healthcare is not exempt from economics, but neither is it reducible to ordinary market exchange. Trust, uncertainty, professional ethics, insurance, and institutional design are not soft background variables. They shape the data, the workflows, the contracts, the measurements, and now the algorithms. The modern EHR did not repeal Arrow. It computerized him. The AI era will not repeal him either. It will merely punish us more quickly if we forget that medical care is built on uncertain knowledge, asymmetric trust, and representations that always arrive carrying fingerprints from the institutions that made them.

The sober lesson for Healthcare IT is this: before asking whether a system can exchange, analyze, or predict, ask what kind of uncertainty it has already hidden. A normal market can survive a bad product review. A healthcare system can bury ambiguity inside a clean field, pass it across an interface, train a model on it, and return it to the bedside as advice. That is not merely a technical flaw. It is Arrow’s world, digitized, accelerated, and waiting for architects who understand that the deepest bugs in healthcare are often not in the code. They are in the representation of the human condition as if it were easier, cleaner, and more obedient than it has ever been.

P.S. References: Kenneth J. Arrow, “Uncertainty and the Welfare Economics of Medical Care,” The American Economic Review, Volume 53, Issue 5, December 1963, pages 941–973. World Health Organization Bulletin reprint and commentary series on Arrow’s role in the birth of health economics. Duke University Press, Uncertain Times: Kenneth Arrow and the Changing Economics of Health Care.